Multiple variables but not a multivariate plot:
Source: https://www.cdc.gov/nchs/images/databriefs/201-250/db217_fig1.png
| Age | Favorite |
|---|---|
| young | bubble gum |
| old | coffee |
| young | bubble gum |
| old | coffee |
| young | bubble gum |
| young | bubble gum |
| old | coffee |
| young | coffee |
| young | bubble gum |
| old | coffee |
| young | bubble gum |
| old | bubble gum |
| old | bubble gum |
| young | bubble gum |
(same problem)
Are older Americans more interested in local news than younger Americans?
We ask 34892 U.S. adults whether or not they follow local news “very closely”. 34.5% say yes.
Group sizes are:
| Age | Freq |
|---|---|
| 18-29 | 2851 |
| 30-49 | 9967 |
| 50-64 | 11163 |
| 65+ | 10911 |
Source: https://www.journalism.org/2019/08/14/methodology-local-news-demographics/
If older Americans are NOT more interested in local news, what would the breakdowns look like?
| Age | Followers | Nonfollowers |
|---|---|---|
| 18-29 | 984 | 1867 |
| 30-49 | 3439 | 6528 |
| 50-64 | 3851 | 7312 |
| 65+ | 3764 | 7147 |
How can we graph this data?
Now let’s look at the actual data…
Null hypothesis: Age and tendency to follow local news are independent
Alternative hypothesis: Age and tendence to follow local news are NOT independent
We compare OBSERVED to EXPECTED:
localmat <- as.matrix(local[,2:3])
rownames(localmat) <- local$Age
X <- chisq.test(localmat, correct = FALSE)
X$observed## Followers Nonfollowers
## 18-29 428 2423
## 30-49 2791 7176
## 50-64 4242 6921
## 65+ 4583 6328
## Followers Nonfollowers
## 18-29 984.1065 1866.893
## 30-49 3440.4032 6526.597
## 50-64 3853.2378 7309.762
## 65+ 3766.2526 7144.747
##
## Pearson's Chi-squared test
##
## data: localmat
## X-squared = 997.48, df = 3, p-value < 0.00000000000000022
What if there were no relationship between the variables?
What if there were a deterministic relationship between the variables?
|
|
|
vcdExtra::Yamaguchi87
| Father | Son | Freq |
|---|---|---|
| UpNM | UpNM | 474 |
| UpNM | LoNM | 129 |
| UpNM | UpM | 87 |
| UpNM | LoM | 124 |
| UpNM | Farm | 11 |
| LoNM | UpNM | 300 |
| LoNM | LoNM | 218 |
| LoNM | UpM | 171 |
| LoNM | LoM | 220 |
| LoNM | Farm | 8 |
| UpM | UpNM | 438 |
| UpM | LoNM | 254 |
| UpM | UpM | 669 |
| UpM | LoM | 703 |
| UpM | Farm | 16 |
| LoM | UpNM | 601 |
| LoM | LoNM | 388 |
| LoM | UpM | 932 |
| LoM | LoM | 1789 |
| LoM | Farm | 37 |
| Farm | UpNM | 76 |
| Farm | LoNM | 56 |
| Farm | UpM | 125 |
| Farm | LoM | 295 |
| Farm | Farm | 191 |
|
|
mosaic plot = any filled rectangular plot (no white space) with consistent numbers of rows and columns, in which the area of each small rectangle is proportional to the frequency count for a unique combination of levels of the categorical variables displayed
mosaic plot = filled rectangular plot with consistent number of rows and columns, where each small rectangle represents a unique combination of levels of factors of the variables displayed
treemap = filled rectangular plot representing hierarchical data (fill color does not necessarily represent frequency count)
spine plot = mosaic plot with straight, parallel cuts in one dimension (“spines”) and only one variable cutting in the other direction
MASS::housing= relative frequency stacked bar charts
p. 145
| Name | Age | Favorite | Music |
|---|---|---|---|
| Emma | young | bubble gum | rock |
| Linda | old | coffee | classical |
| Emily | young | bubble gum | rock |
| Deborah | old | coffee | classical |
| Charlotte | young | bubble gum | rock |
| Olivia | young | bubble gum | classical |
| Barbara | old | coffee | rock |
| Sophia | young | coffee | classical |
| Ava | young | bubble gum | rock |
| Patricia | old | coffee | classical |
| Isabella | young | bubble gum | rock |
| Nancy | old | bubble gum | classical |
| Karen | old | bubble gum | rock |
| Harper | young | bubble gum | classical |
| Age | Favorite | Freq |
|---|---|---|
| old | bubble gum | 2 |
| old | coffee | 4 |
| young | bubble gum | 7 |
| young | coffee | 1 |
## Favorite
## Age bubble gum coffee
## old 2 4
## young 7 1
–> MosaicCoding.R
Source: “Perceived losses of scientific integrity under the Trump administration: A survey of federal scientists”
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231929
HH package
(Note: these examples are designed to highlight one parameter at a time. They should not be taken as complete examples as they do not follow all best practices.)
Abbreviate labels, one setting per variable in order of variable splits
labeling = labeling_border(abbreviate_labs = c(FALSE, 3, 6))
See ?vcd::labelings
Change spacing between factor levels, one setting per variable in order of variable splits
spacing = spacing_dimequal(c(.5, .1, 0))
See ?vcd::spacings
set_varnames inside labeling function
labeling = labeling_border(set_varnames = c(...))
Change angle of displayed factor levels, one setting per side, starting with top (top, right, bottom, left)
rot_labels = c(0, 0, 0, 0)
(If labeling = is included, rot_labels should be inside the labeling function, for example:
labeling = labeling_border(rot_labels = c(0, 0, 0, 0), ...)